Fault-Tolerance Implementation in Typical Distributed Stream Processing Systems
نویسندگان
چکیده
Typical training simulation systems adopt distributed network architecture designs composed of personal computers because of cost, extensibility, and maintenance considerations. In this design, the functions of the entire system are easily affected by failures or errors from any computer during operation. Thus, adopting appropriate fault-tolerance processing mechanisms to ensure that the normal operation and functions of the entire system can be maintained when irregularities occur in a subsystem computer is an important consideration for typical training simulation system design. Since firearms training simulation system operations involve the transmission and processing of substantial amounts of streaming data, these can be considered typical distributed stream processing systems. In this paper, we examined typical distributed stream processing fault-tolerance mechanism designs and technique. We applied this technique to a typical firearms training simulation system to increase the operation reliability and availability. We used the transparent checkpoint method to implement the fault-tolerance mechanism processing program. The results of single-machine fault-tolerance mechanism tests and multi-machine synchronized fault-tolerance mechanism tests indicate that the performance of the checkpoint establishment and rollback recovery time can satisfy the system operation requirements.
منابع مشابه
Toward High-Performance Distributed Stream Processing via Approximate Fault Tolerance
Fault tolerance is critical for distributed stream processing systems, yet achieving error-free fault tolerance often incurs substantial performance overhead. We present AF-Stream, a distributed stream processing system that addresses the trade-off between performance and accuracy in fault tolerance. AF-Stream builds on a notion called approximate fault tolerance, whose idea is to mitigate back...
متن کاملA Quality-Centric Data Model for Distributed Stream Management Systems
It is challenging for large-scale stream management systems to return always perfect results when processing data streams originating from distributed sources. Data sources and intermediate processing nodes may fail during the lifetime of a stream query. In addition, individual nodes may become overloaded due to processing demands. In practice, users have to accept incomplete or inaccurate quer...
متن کاملWhen Stream Processing crosses MapReduce
Although Event Stream Processing (ESP) systems exit for already more than a decade, we recently witness a true renaisance for ESP systems that have adopted the popular MapReduce paradigm. In this white paper, we advocate for the StreamMapReduce approach as it allows a (i) quick and easy transition of legacy MapReduce-based applications to ESP, (ii) simplifies the implementation of fault toleran...
متن کاملFault-tolerant stream processing using a distributed, replicated file system
We present SGuard, a new fault-tolerance technique for distributed stream processing engines (SPEs) running in clusters of commodity servers. SGuard is less disruptive to normal stream processing and leaves more resources available for normal stream processing than previous proposals. Like several previous schemes, SGuard is based on rollback recovery [18]: it checkpoints the state of stream pr...
متن کاملFault tolerance for stream programs on parallel platforms
A distributed system is defined as a collection of autonomous computers connected by a network, and with the appropriate distributed software for the system to be seen by users as a single entity capable of providing computing facilities. Distributed systems with centralised control have a distinguished control node, called leader node. The main role of a leader node is to distribute and manage...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Inf. Sci. Eng.
دوره 30 شماره
صفحات -
تاریخ انتشار 2014